Hybrid probabilistic sampling with random subspace for imbalanced data learning
نویسندگان
چکیده
Class imbalance is one of the challenging problems for machine learning in many real-world applications. Other issues, such as within-class imbalance and high dimensionality, can exacerbate the problem. We propose a method HPSDRS that combines two ideas: Hybrid Probabilistic Sampling technique ensemble with Diverse Random Subspace to address these issues. HPS improves the performance of traditional re-sampling algorithms with the aid of probability function, since it is not sufficient to simply manipulate the class sizes for imbalanced data with complex distribution. Moreover, DRS ensemble employs the minimum overlapping mechanism to provide diversity and weighted voting, so as to improve the generalization performance. The experimental results demonstrate that our method is efficient for learning from imbalanced data and can achieve better results than state-of-the-art methods for imbalanced data.
منابع مشابه
Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD
Classification plays a critical role in false positive reduction (FPR) in lung nodule computer aided detection (CAD). The difficulty of FPR lies in the variation of the appearances of the nodules, and the imbalance distribution between the nodule and non-nodule class. Moreover, the presence of inherent complex structures in data distribution, such as within-class imbalance and high-dimensionali...
متن کاملForesTexter: An efficient random forest algorithm for imbalanced text categorization
In this paper, we propose a new Random Forest (RF) based ensemble method, ForesTexter, to solve the imbalanced text categorization problems. RF has shown great success in many real-world applications. However, the problem of learning from text data with class imbalance is a relatively new challenge that needs to be addressed. A RF algorithm tends to use a simple random sampling of features in b...
متن کاملAn Effective Method for Imbalanced Time Series Classification: Hybrid Sampling
Most traditional supervised classification learning algorithms are ineffective for highly imbalanced time series classification, which has received considerably less attention than imbalanced data problems in data mining and machine learning research. Bagging is one of the most effective ensemble learning methods, yet it has drawbacks on highly imbalanced data. Sampling methods are considered t...
متن کاملA hybrid approach to learn with imbalanced classes using evolutionary algorithms
There is an increasing interest in application of Evolutionary Algorithms to induce classification rules. This hybrid approach can aid in areas that classical methods to rule induction have not been completely successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when some classes heavily outnumbers other classes. Frequently, classical Mac...
متن کاملCUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intell. Data Anal.
دوره 18 شماره
صفحات -
تاریخ انتشار 2014